1000 resultados para Spam detection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large groundtruth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 light-weight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets to Twitter users to promote websites or services, which are harmful to normal users. In order to stop spammers, researchers have proposed a number of mechanisms. The focus of recent works is on the application of machine learning techniques into Twitter spam detection. However, tweets are retrieved in a streaming way, and Twitter provides the Streaming API for developers and researchers to access public tweets in real time. There lacks a performance evaluation of existing machine learning-based streaming spam detection methods. In this paper, we bridged the gap by carrying out a performance evaluation, which was from three different aspects of data, feature, and model. A big ground-truth of over 600 million public tweets was created by using a commercial URL-based security tool. For real-time spam detection, we further extracted 12 lightweight features for tweet representation. Spam detection was then transformed to a binary classification problem in the feature space and can be solved by conventional machine learning algorithms. We evaluated the impact of different factors to the spam detection performance, which included spam to nonspam ratio, feature discretization, training data size, data sampling, time-related data, and machine learning algorithms. The results show the streaming spam tweet detection is still a big challenge and a robust detection technique should take into account the three aspects of data, feature, and model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to distinguish between spam and legitimate email messages. Much work has been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection FP problem is unacceptable sometimes. In this paper, an adaptive spam filtering model has been proposed based on Machine learning (ML) algorithms which will get better accuracy by reducing FP problems. This model consists of individual and combined filtering approach from existing well known ML algorithms. The proposed model considers both individual and collective output and analyzes them by an analyzer. A dynamic feature selection (DFS) technique also proposed in this paper for getting better accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Anti-spam technology is developing rapidly in recent years. With the emerging applications of machine learning in diverse fields, researchers as well as manufacturers around the world have attempted a large number of related algorithms to prevent spam. In this paper, we designed an effective anti-spam protection system, SpamCooling, based on the mechanism of active learning and parallel heterogeneous ensemble learning techniques. The system adopts a batch method to filter spam and can be easily incorporated with existing mail clients (MUA). It can actively obtain user feedbacks for providing users with personalized spam filtering experiences. The parallel heterogeneous ensemble method can help system achieve high spam detection rate as well as low ham misclassification rate.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spam has become a critical problem on Twitter. In order to stop spammers, security companies apply blacklisting services to filter spam links. However, over 90% victims will visit a new malicious link before it is blocked by blacklists. To eliminate the limitation of blacklists, researchers have proposed a number of statistical features based mechanisms, and applied machine learning techniques to detect Twitter spam. In our labelled large dataset, we observe that the statistical properties of spam tweets vary over time, and thus the performance of existing ML based classifiers are poor. This phenomenon is referred as 'Twitter Spam Drift'. In order to tackle this problem, we carry out deep analysis of 1 million spam tweets and 1 million non-spam tweets, and propose an asymmetric self-learning (ASL) approach. The proposed ASL can discover new information of changed tweeter spam and incorporate it into classifier training process. A number of experiments are performed to evaluate the ASL approach. The results show that the ASL approach can be used to significantly improve the spam detection accuracy of using traditional ML algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2012

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we propose a new technique of email classification based on grey list (GL) analysis of user emails. This technique is based on the analysis of output emails of an integrated model which uses multiple classifiers of statistical learning algorithms. The GL is a list of classifier/(s) output which is/are not considered as true positive (TP) and true negative (TN) but in the middle of them. Many works have been done to filter spam from legitimate emails using classification algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection the FP problem is unacceptable, sometimes. The proposed technique will provide a list of output emails, called "grey list (GL)", to the analyser for making decisions about the status of these emails. It has been shown that the performance of our proposed technique for email classification is much better compare to existing systems, in order to reducing FP problems and accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There has been a huge growth of social network in the recent years. This trend does not only allow us to get connected and share the information in an efficient way, but also reveals some potential beneficial in dealing with several social issues, such as earthquake detection, social spam detection, flu pandemic tracking, media monitoring, etc. In this paper, we propose a new way of utilizing social network. By implementing what is called a Virtual Celebrator Machine (VCM), we are able to let everyone who has connection with this machine in term of social networking be able to share their cultural experience and points of view about certain social events locally or globally. In that way, we provide a way to reinforce the relationship and connection between people virtually, which, we believe, would help to flourish cultural heritage preservation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work introduces two major changes to the conventional protocol for designing plastic antibodies: (i) the imprinted sites were created with charged monomers while the surrounding environment was tailored using neutral material; and (ii) the protein was removed from its imprinted site by means of a protease, aiming at preserving the polymeric network of the plastic antibody. To our knowledge, these approaches were never presented before and the resulting material was named here as smart plastic antibody material (SPAM). As proof of concept, SPAM was tailored on top of disposable gold-screen printed electrodes (Au-SPE), following a bottom-up approach, for targeting myoglobin (Myo) in a point-of-care context. The existence of imprinted sites was checked by comparing a SPAM modified surface to a negative control, consisting of similar material where the template was omitted from the procedure and called non-imprinted materials (NIMs). All stages of the creation of the SPAM and NIM on the Au layer were followed by both electrochemical impedance spectroscopy (EIS) and cyclic voltammetry (CV). AFM imaging was also performed to characterize the topography of the surface. There are two major reasons supporting the fact that plastic antibodies were effectively designed by the above approach: (i) they were visualized for the first time by AFM, being present only in the SPAM network; and (ii) only the SPAM material was able to rebind to the target protein and produce a linear electrical response against EIS and square wave voltammetry (SWV) assays, with NIMs showing a similar-to-random behavior. The SPAM/Au-SPE devices displayed linear responses to Myo in EIS and SWV assays down to 3.5 μg/mL and 0.58 μg/mL, respectively, with detection limits of 1.5 and 0.28 μg/mL. SPAM materials also showed negligible interference from troponin T (TnT), bovine serum albumin (BSA) and urea under SWV assays, showing promising results for point-of-care applications when applied to spiked biological fluids.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Les courriels Spams (courriels indésirables ou pourriels) imposent des coûts annuels extrêmement lourds en termes de temps, d’espace de stockage et d’argent aux utilisateurs privés et aux entreprises. Afin de lutter efficacement contre le problème des spams, il ne suffit pas d’arrêter les messages de spam qui sont livrés à la boîte de réception de l’utilisateur. Il est obligatoire, soit d’essayer de trouver et de persécuter les spammeurs qui, généralement, se cachent derrière des réseaux complexes de dispositifs infectés, ou d’analyser le comportement des spammeurs afin de trouver des stratégies de défense appropriées. Cependant, une telle tâche est difficile en raison des techniques de camouflage, ce qui nécessite une analyse manuelle des spams corrélés pour trouver les spammeurs. Pour faciliter une telle analyse, qui doit être effectuée sur de grandes quantités des courriels non classés, nous proposons une méthodologie de regroupement catégorique, nommé CCTree, permettant de diviser un grand volume de spams en des campagnes, et ce, en se basant sur leur similarité structurale. Nous montrons l’efficacité et l’efficience de notre algorithme de clustering proposé par plusieurs expériences. Ensuite, une approche d’auto-apprentissage est proposée pour étiqueter les campagnes de spam en se basant sur le but des spammeur, par exemple, phishing. Les campagnes de spam marquées sont utilisées afin de former un classificateur, qui peut être appliqué dans la classification des nouveaux courriels de spam. En outre, les campagnes marquées, avec un ensemble de quatre autres critères de classement, sont ordonnées selon les priorités des enquêteurs. Finalement, une structure basée sur le semiring est proposée pour la représentation abstraite de CCTree. Le schéma abstrait de CCTree, nommé CCTree terme, est appliqué pour formaliser la parallélisation du CCTree. Grâce à un certain nombre d’analyses mathématiques et de résultats expérimentaux, nous montrons l’efficience et l’efficacité du cadre proposé.